These will eventually be incorporated into tbeptools; for now sourcing the .R files
<<<<<<< HEAD
=======
>>>>>>> 73317b75bfa7d009ad3711bcd7401729aca24843
Code
# to download Entero data at specified stationssource(here::here("R", "read_importwqpStationsAndStuff.R"))# to download precip data from SWFWMDsource(here::here("R", "raindl.R"))# to identify wet/dry fib samples source(here::here("R", "anlz_fibwetdry.R"))
Download things
Enterococcus data
<<<<<<< HEAD
=======
>>>>>>> 73317b75bfa7d009ad3711bcd7401729aca24843
Code
stns <-unique(catch_pixels$station)stns[which(stns =="21FLCOSP_WQX-CENTRAL *")] <-"21FLCOSP_WQX-CENTRAL CANAL"# define argumentsstations <- stns # can subset if desiredentero_names <-c("Enterococci","Enterococcus")startDate <-as.Date("2018-01-01")endDate <-as.Date("2022-12-31")
<<<<<<< HEAD
=======
>>>>>>> 73317b75bfa7d009ad3711bcd7401729aca24843
Code
# put them together# date format has to be mm-dd-yyyy for the urlargs <-list(siteid = stations,characteristicName = entero_names,startDateLo =format(startDate, "%m-%d-%Y"),startDateHi =format(endDate, "%m-%d-%Y"))
<<<<<<< HEAD
=======
>>>>>>> 73317b75bfa7d009ad3711bcd7401729aca24843
Code
Some of the FIB stations are in the same catchments used for rain data - when calculating FIB amounts, we want them included; but when calculating rain amounts, we don’t want to falsely replicate them. So define which ones should be removed for calculating rain stats:
<<<<<<< HEAD
=======
>>>>>>> 73317b75bfa7d009ad3711bcd7401729aca24843
Code
Visualize Entero concentrations under various thresholds
These are violin plots - like boxplots, but showing the shape of the distribution - with quantiles drawn as horizontal lines (25th %ile; median; 75th %ile)
<<<<<<< HEAD
=======
>>>>>>> 73317b75bfa7d009ad3711bcd7401729aca24843
Code
It looks like the shape doesn’t change too much by varying the amount of precip over the same number of days. Let’s look at a half-inch threshold over each of the days:
<<<<<<< HEAD
=======
>>>>>>> 73317b75bfa7d009ad3711bcd7401729aca24843
Code
(grps[[2]] + grps[[6]]) / (grps[[10]] + grps[[14]]) +plot_annotation(title ="0.5 inches over varying # of days")
<<<<<<< HEAD
=======
>>>>>>> 73317b75bfa7d009ad3711bcd7401729aca24843
It’s hard to compare these because of the NAs for wet-sample throwing off the x-axis. I’ve kept those because it’s important to notice that we’re getting NAs when we do this - presumably only at the beginning of the entire time series when we don’t have data from the days before sampling because of the year cutoff. Important to note that we need to download precip data for at least several days before FIB samples - e.g. if you want to look at FIBs from 2018-2023, download precip data starting in mid-December 2017, to avoid those NAs.
Removing unidentified samples
Make those graphs again but cutting out the NAs now.
<<<<<<< HEAD
=======
>>>>>>> 73317b75bfa7d009ad3711bcd7401729aca24843
Code
outs_graphics_conc2 <-list()for(i in1:nrow(definition_combos)){ wetthrs <- definition_combos$wet_threshold[i] tmpthrs <- definition_combos$temporal_threshold[i] fib_out <-anlz_fibwetdry(fibdat, prcpdat, temporal_window = tmpthrs, wet_threshold = wetthrs) |>filter(!is.na(wet_sample))# sample information# ------------------------------------------------------------------------# number wet samples nwet <-sum(fib_out$wet_sample, na.rm =TRUE)# number dry samples ndry <-sum(fib_out$wet_sample ==FALSE, na.rm =TRUE)# number samples nsamps <-nrow(fib_out)# proportion of samples that are wet, out of those that could be ID'd: prop_wet <- nwet / (nwet + ndry)# percent wet pct_wet <-round(prop_wet*100, 1)# number stations nstns <-length(unique(fib_out$station))# graphs# -------------------------------------------------------------------------- outs_graphics_conc2[[i]] <-ggplot(fib_out, aes(x = wet_sample,y = FIBconc,fill = wet_sample)) +geom_violin(draw_quantiles =c(0.25, 0.5, 0.75),alpha =0.3) +scale_y_log10() +labs(subtitle = glue::glue("\n{pct_wet}% of {nsamps} samples across {nstns} stations 'wet' \n'wet' threshold {wetthrs} inches over {tmpthrs} days")) +theme(legend.position ="none")}
<<<<<<< HEAD
=======
>>>>>>> 73317b75bfa7d009ad3711bcd7401729aca24843
Code
grps <- outs_graphics_conc2# over one day(grps[[1]] + grps[[2]]) / (grps[[3]] + grps[[4]]) +plot_annotation(title ="varying precip over 1 day")
# 0.75 in(grps[[3]] + grps[[7]]) / (grps[[11]] + grps[[15]]) +plot_annotation(title ="0.75 inches over varying # of days")
<<<<<<< HEAD
=======
>>>>>>> 73317b75bfa7d009ad3711bcd7401729aca24843
Distribution of concentrations really doesn’t change too much, visually. It’s the top end of wet samples that changes shape a bit, and the percent of samples that get classified as wet.
~18% of samples get classified as ‘wet’ with 0.5 inch over 2 days or 0.75 inches over 3 days.
~25% of samples are classified as ‘wet’ with 0.5 inch over 3 days.
These numbers all feel pretty reasonable.
Proportion of Exceedances?
Does this change?
<<<<<<< HEAD
=======
>>>>>>> 73317b75bfa7d009ad3711bcd7401729aca24843
Code
ggplot(filter(long, wet_threshold %in%c(0.5, 0.75), temporal_window %in%c(2, 3)), aes(x = sampleType,y = propExceedances,col =as.factor(wet_threshold),shape =as.factor(temporal_window))) +geom_point(size =3,alpha =0.8) +labs(title ="Exceedances",subtitle ="only 0.5 or 0.75 inches",x ="Sample Type",y ="proportion of samples over 70 CFU/100mL",shape ="# days",col ="inches of precip")
<<<<<<< HEAD
=======
>>>>>>> 73317b75bfa7d009ad3711bcd7401729aca24843
The biggest jumps of exceedance seem to happen by increasing the number of inches required to classify a sample as ‘wet’. The difference seems to be around 10% between 4 days and 1 day within each “inches” threshold. Whichever threshold combinations get used, there will be more exceedances in wet samples than dry.
Half an inch over 2 days seems essentially the same across all these metrics as 0.75 inches over 3 days.
<<<<<<< HEAD
=======
>>>>>>> 73317b75bfa7d009ad3711bcd7401729aca24843
Code
<<<<<<< HEAD
=======
>>>>>>> 73317b75bfa7d009ad3711bcd7401729aca24843
Again, 0.5 inches over 2 days is pretty similar to 0.75 inches over 3 days. The others have pretty different geometric means and number of samples classified as wet.
We need a certain amount of samples to even be considered wet before we can reliably distinguish - e.g. if only 5% of samples are wet, we can’t really say much. So what have we got? The histogram shows proportions (e.g. 0.1 = 10%).
<<<<<<< HEAD
=======
>>>>>>> 73317b75bfa7d009ad3711bcd7401729aca24843
Code
These examples show a large range of options for number of preceding days and cumulative precipitation to assess the “response surface” for samples defined as wet or dry and their differences. The goal is to better understand how the selected options can discriminate between samples or to determine when the selected options are not useful. Every combination of options from 0.25 to 4 inches of precipitation increasing by 0.25 inches over 1 to 14 days are evaluated for a total of 224 combinations.
ggplot(res, aes(x =factor(wet_threshold), y =factor(temporal_window))) +geom_tile(aes(fill = meddiff), color ='black') +geom_text(aes(label =round(meddiff, 0)), size =3, color ='white') +scale_fill_viridis_c() +scale_x_discrete(expand =c(0, 0)) +scale_y_discrete(expand =c(0, 0)) +theme(legend.position ='top') +labs(y ="Temporal Window (days)",x ="Wet Threshold (inches)",fill ='Diff. in median between wet/dry' )
Code
ggplot(res, aes(x =factor(wet_threshold), y =factor(temporal_window))) +geom_tile(aes(fill = propExceedances_wet), color ='black') +geom_text(aes(label =round(propExceedances_wet, 2)), size =3, color ='white') +scale_fill_viridis_c() +scale_x_discrete(expand =c(0, 0)) +scale_y_discrete(expand =c(0, 0)) +theme(legend.position ='top') +labs(y ="Temporal Window (days)",x ="Wet Threshold (inches)",fill ='Prop. exceedances in wet samples' )
Code
toplo <- res |>select(wet_threshold, temporal_window, pct_wet, meddiff, propExceedances_wet) |>pivot_longer(-c(wet_threshold, temporal_window),names_to ="stat",values_to ="value") |>mutate(stat =factor(stat, levels =c('pct_wet', 'meddiff', 'propExceedances_wet'), labels =c('Percent wet samples', 'Diff. in median between wet/dry', 'Prop. exceedances in wet samples')) )ggplot(toplo, aes(x = wet_threshold, y = value, group =factor(temporal_window), color = temporal_window)) +geom_line() +scale_color_viridis_c() +theme_minimal() +theme(legend.position ='top', strip.placement ='outside' ) +facet_wrap(~stat, scales ='free_y', strip.position ='left') +labs(y =NULL,x ="Wet Threshold (inches)",color ='Temporal Window (days)' )
These examples show competing objectives for defining wet/dry samples, e.g., maximizing median differences while creating a balance between wet/dry samples. For example, the maximum medium difference has a very small percentage of wet samples because the criteria for wet samples is very strict, e.g., an excessive amount of rain in a very short period of time. The criteria can be optimized to identify which combination of wet threshold and temporal window maintains the best balance in percentage between wet/dry samples and the greatest median difference.
>>>>>>> 73317b75bfa7d009ad3711bcd7401729aca24843
<<<<<<< HEAD
=======
>>>>>>> 73317b75bfa7d009ad3711bcd7401729aca24843